Search CORE

155 research outputs found

An Efficient Algorithm to Test Square-Freeness of Strings Compressed by Balanced Straight Line Program

Author: Inenaga Shunsuke
Matsubara Wataru
Shinohara Ayumi
Publication venue: Dagstuhl Seminar Proceedings. 08261 - Structure-Based Compression of Complex Massive Data
Publication date: 01/01/2008
Field of study

In this paper we study the problem of deciding whether a given compressed string contains a square. A string x is called a square if x = zz and z = u^k implies k = 1 and u = z. A string w is said to be square-free if no substrings of w are squares. Many efficient algorithms to test if a given string is square-free, have been developed so far. However, very little is known for testing square-freeness of a given compressed string. In this paper, we give an O(max(n^2; n log^2 N))-time O(n^2)-space solution to test square-freeness of a given compressed string, where n and N are the size of a given compressed string and the corresponding decompressed string, respectively. Our input strings are compressed by balanced straight line program (BSLP). We remark that BSLP has exponential compression, that is, N = O(2^n). Hence no decompress-then-test approaches can be better than our method in the worst case

Dagstuhl Research Online Publication Server

In-Place Bijective Burrows-Wheeler Transforms

Author: Hashimoto Daiki
Hendrian Diptarama
Shinohara Ayumi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Publication date: 01/01/2020
Field of study

One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores this multiset of Lyndon factors such that the original text can be obtained by sorting these factors in non-increasing order. In this paper, we present algorithms constructing or inverting the BBWT in-place using quadratic time. We also present conversions from the BBWT to the BWT, or vice versa, either (a) in-place using quadratic time, or (b) in the run-length compressed setting using ?(n lg r / lg lg r) time with ?(r lg n) bits of words, where r is the sum of character runs in the BWT and the BBWT

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Inferring Strings from Position Heaps in Linear Time

Author: Hendrian Diptarama
Kumagai Koshiro
Shinohara Ayumi
Yoshinaka Ryo
Publication venue
Publication date: 12/12/2022
Field of study

Position heaps are index structures of text strings used for the string matching problem. They are rooted trees whose edges and nodes are labeled and numbered, respectively. This paper is concerned with variants of the inverse problem of position heap construction and gives linear-time algorithms for those problems. The basic problem is to restore a text string from a rooted tree with labeled edges and numbered nodes. In the variant problems, the input trees may miss edge labels or node numbers which we must restore as well.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Position Heaps for Parameterized Strings

Author: Diptarama Diptarama
Katsura Takashi
Narisawa Kazuyuki
Otomo Yuhei
Shinohara Ayumi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized position heap of a text and show that our algorithm runs in linear time with respect to the text size. We also show that by using parameterized position heap, we can find all occurrences of a pattern in the text in linear time with respect to the product of the pattern size and the alphabet size

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server